-
Notifications
You must be signed in to change notification settings - Fork 4k
sql: improve observability for statements with hints #157160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
yuzefovich
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
@yuzefovich reviewed 7 of 7 files at r1, 3 of 3 files at r2, 13 of 13 files at r3, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @angles-n-daemons and @michae2)
-- commits line 22 at r2:
Should we do the same for regular EXPLAIN? I.e. the hints should be applied regardless of whether we execute the stmt. Given that we only show the count when non-zero it seems reasonable to show this by default too (i.e. not only in VERBOSE mode).
pkg/sql/opt/exec/execbuilder/testdata/explain_analyze line 291 at r2 (raw file):
query T EXPLAIN ANALYZE SELECT k FROM kv WHERE k >= 100
nit: if we're only interested in one line, perhaps apply LIKE '%hints count%' filter.
michae2
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool!
@michae2 reviewed 7 of 7 files at r1, 3 of 3 files at r2, 13 of 13 files at r3, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @angles-n-daemons, @DrewKimball, and @yuzefovich)
Previously, yuzefovich (Yahor Yuzefovich) wrote…
Should we do the same for regular EXPLAIN? I.e. the hints should be applied regardless of whether we execute the stmt. Given that we only show the count when non-zero it seems reasonable to show this by default too (i.e. not only in VERBOSE mode).
Good point, this does seem worth showing in EXPLAIN as well.
pkg/sql/hints/hint_cache.go line 385 at r1 (raw file):
// Wait in intervals of at least 100 milliseconds to avoid busy-waiting. const minWait = time.Millisecond * 100 waitUntil := c.clock.Now().WallTime
When I used only Now() I was getting failures in:
./dev testlogic base --config=local --files=statement_hint_builtins --ignore-cache --stress
(Usually whenever kv-batch-size = 1.)
But if this is passing for you, then I'm happy.
pkg/sql/hints/hint_cache.go line 393 at r1 (raw file):
case <-ctx.Done(): return case <-time.After(max(time.Duration(frontier-waitUntil), minWait)):
I might be misunderstanding, but if frontier <= waitUntil, then won't frontier - waitUntil always be <= 0? Meaning this is always minWait?
DrewKimball
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @angles-n-daemons, @michae2, and @yuzefovich)
Previously, michae2 (Michael Erickson) wrote…
Good point, this does seem worth showing in EXPLAIN as well.
Done.
pkg/sql/hints/hint_cache.go line 385 at r1 (raw file):
Previously, michae2 (Michael Erickson) wrote…
When I used only
Now()I was getting failures in:./dev testlogic base --config=local --files=statement_hint_builtins --ignore-cache --stress(Usually whenever kv-batch-size = 1.)
But if this is passing for you, then I'm happy.
I became concerned there was a bug somewhere, so spent some time digging. It turns out that when UseRangeTombstonesForPointDeletes is set for the logic test, we replace SQL deletes with MVCC range tombstones (which are only normally placed for specific ops like TRUNCATE that shouldn't happen for system tables). Rangefeeds don't currently know how to handle that, so the delete is skipped. I think the increased wait time probably "fixes" it because the tombstone is compacted away.
Anyway, with UseRangeTombstonesForPointDeletes disabled for that test, it passes even with no fudge factor in the Await loop.
pkg/sql/hints/hint_cache.go line 393 at r1 (raw file):
Previously, michae2 (Michael Erickson) wrote…
I might be misunderstanding, but if frontier <= waitUntil, then won't frontier - waitUntil always be <= 0? Meaning this is always minWait?
Oops, dumb mistake. Fixed it.
pkg/sql/opt/exec/execbuilder/testdata/explain_analyze line 291 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: if we're only interested in one line, perhaps apply
LIKE '%hints count%'filter.
EXPLAIN ANALYZE can only be a top-level statement. And we can't do it for EXPLAIN either, because the hints won't apply.
1100e9b to
96d57aa
Compare
Potential Bug(s) DetectedThe three-stage Claude Code analysis has identified potential bug(s) in this PR that may warrant investigation. Next Steps: Note: When viewing the workflow output, scroll to the bottom to find the Final Analysis Summary. After you review the findings, please tag the issue as follows:
|
yuzefovich
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yuzefovich reviewed 2 of 18 files at r4, 5 of 5 files at r5, 13 of 13 files at r6, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @angles-n-daemons and @michae2)
Previously, DrewKimball (Drew Kimball) wrote…
Done.
We should adjust the release note to mention EXPLAIN.
pkg/sql/opt/exec/execbuilder/testdata/explain_analyze line 291 at r2 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
EXPLAIN ANALYZE can only be a top-level statement. And we can't do it for EXPLAIN either, because the hints won't apply.
Right, nvm.
pkg/sql/hints/hint_cache.go line 223 at r4 (raw file):
ctx context.Context, update rangefeedcache.Update[*bufferEvent], ) { if update.Type == rangefeedcache.CompleteUpdate || len(update.Events) == 0 {
Can you expand on the reason for this change?
If I'm reading the code right, the rangefeed buffer is flushed every time the frontier TS is bumped (and after the initial scan is complete), and all events are included in the incremental update. (The comment on Buffer says that if the capacity is exceeded when adding an event, then the error is returned. Which I guess we just ignore because we pass nil /* onError */.) In other words, every incremental update corresponds to the frontier TS bump, which is how the code previously behaved.
pkg/sql/opt/exec/execbuilder/testdata/explain line 2591 at r6 (raw file):
query T EXPLAIN SELECT k FROM t_hints WHERE k >= 100
Looks like the hint count is missing here.
Previously, `crdb_internal.await_statement_hints_cache` waited until the hint cache frontier timestamp reached `now()` plus the closed timestamp target duration and refresh interval. The frontier timestamp indicates that we've seen all events up to that timestamp, so it's OK to just wait until `now()` instead. This commit also fixes a flake in the `statement_hint_builtins` logic test due to a test flag `UseRangeTombstonesForPointDeletes`. Epic: None Release note: None
This commit adds a top-level field to the output of `EXPLAIN` and `EXPLAIN ANALYZE` showing the number of hints applied to the statement, if nonzero. Informs cockroachdb#121502 Release note (sql change): EXPLAIN and EXPLAIN ANALYZE will now display the number of hints from `system.statement_hints` applied to the executed statement.
This commit adds a field to the "plan details" shown for statements in the "statement activity" page of the DB console. The new field indicates whether any statement hints from `system.statement_hints` were applied to the statement. Informs cockroachdb#121502 Release note (sql change): Plan details in the "statement activity" page of the DB console now show whether any hints from `system.statement_hints` werer applied to the statement execution.
96d57aa to
1d578a3
Compare
DrewKimball
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @angles-n-daemons, @michae2, and @yuzefovich)
Previously, yuzefovich (Yahor Yuzefovich) wrote…
We should adjust the release note to mention EXPLAIN.
Done.
pkg/sql/hints/hint_cache.go line 223 at r4 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
Can you expand on the reason for this change?
If I'm reading the code right, the rangefeed buffer is flushed every time the frontier TS is bumped (and after the initial scan is complete), and all events are included in the incremental update. (The comment on
Buffersays that if the capacity is exceeded when adding an event, then the error is returned. Which I guess we just ignore because we passnil /* onError */.) In other words, every incremental update corresponds to the frontier TS bump, which is how the code previously behaved.
Yes, I think you're right. I had convinced myself that we might be receiving multiple incremental updates at the same timestamp, but that was before I found the thing with range tombstones.
pkg/sql/opt/exec/execbuilder/testdata/explain line 2591 at r6 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
Looks like the hint count is missing here.
Done. Just needed to regenerate the test.
michae2
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@michae2 reviewed 1 of 18 files at r4, 19 of 19 files at r7, 5 of 5 files at r8, 13 of 13 files at r9, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @angles-n-daemons, @DrewKimball, and @yuzefovich)
pkg/sql/hints/hint_cache.go line 385 at r1 (raw file):
Ahh that explains it! Nice digging!
which are only normally placed for specific ops like TRUNCATE that shouldn't happen for system tables
Interesting. I guess we will need to remember that TRUNCATE won't be picked up by the cache unless we follow it with another mutation like a DELETE FROM system.statement_hints.
yuzefovich
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yuzefovich reviewed 19 of 19 files at r7, 5 of 5 files at r8, 13 of 13 files at r9, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @angles-n-daemons and @DrewKimball)
-- commits line 41 at r9:
nit: s/werer/were/.
pkg/sql/hints/hint_cache.go line 389 at r7 (raw file):
// Await is only used for testing, so we don't need to wake up immediately. We // can get away with polling the frontier time. for frontier := c.frontier.Load(); frontier <= waitUntil; frontier = c.frontier.Load() {
nit: not important (and perhaps impossible), but why should we wait when frontier == waitUntil?
DrewKimball
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @angles-n-daemons, @michae2, and @yuzefovich)
pkg/sql/hints/hint_cache.go line 385 at r1 (raw file):
Interesting. I guess we will need to remember that TRUNCATE won't be picked up by the cache unless we follow it with another mutation like a DELETE FROM system.statement_hints.
There might be other ways it could happen, but TRUNCATE at least isn't allowed for system tables. I guess that's because it drops and recreates the table?
pkg/sql/hints/hint_cache.go line 389 at r7 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: not important (and perhaps impossible), but why should we wait when
frontier == waitUntil?
Might not matter in practice, but we're comparing wall time here, so even if it's equal, the logical frontier timestamp could technically still be earlier than now().
|
TFYRs! bors r+ |
|
Build succeeded: |
builtins: decrease wait time for await_statement_hints_cache
Previously,
crdb_internal.await_statement_hints_cachewaited untilthe hint cache frontier timestamp reached
now()plus the closedtimestamp target duration and refresh interval. The frontier timestamp
indicates that we've seen all events up to that timestamp, so it's OK
to just wait until
now()instead.Epic: None
Release note: None
explain: display number of hints for EXPLAIN ANALYZE
This commit adds a top-level field to the
EXPLAIN ANALYZEoutputshowing the number of hints applied to the statement if nonzero.
Informs #121502
Release note (sql change): EXPLAIN ANALYZE will now display the number
of hints from
system.statement_hintsapplied to the executed statement.ui: show whether statement hints were applied in DB console
This commit adds a field to the "plan details" shown for statements in
the "statement activity" page of the DB console. The new field indicates
whether any statement hints from
system.statement_hintswere appliedto the statement.
Informs #121502
Release note (sql change): Plan details in the "statement activity" page
of the DB console now show whether any hints from
system.statement_hintswerer applied to the statement execution.